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ABSTRACT:  Molecular  mechanics  and  dynamics  simulations  use  distance  based 
cutoff  approximations  for  faster  computation  of  pairwise  van  der  Waals  and 
electrostatic  energy  terms.  These  approximations  traditionally  use  a  precalculated 
and  periodically  updated  list  of  interacting  atom  pairs,  known  as  the  “nonbonded 
neighborhood  lists”  or  nblists,  in  order  to  reduce  the  overhead  of  finding  atom 
pairs  that  are  within  distance  cutoff.  The  size  of  nblists  grows  linearly  with  the 
number  of  atoms  in  the  system  and  superlinearly  with  the  distance  cutoff,  and  as  a 
result,  they  require  significant  amount  of  memory  for  large  molecular  systems. 

The  high  space  usage  leads  to  poor  cache  performance,  which  slows  computation 
for  large  distance  cutoffs.  Also,  the  high  cost  of  updates  means  that  one  cannot 
afford  to  keep  the  data  structure  always  synchronized  with  the  configuration  of  the 
molecules  when  efficiency  is  at  stake.  We  propose  a  dynamic  octree  data  structure 
for  imphcit  maintenance  of  nblists  using  space  hnear  in  the  number  of  atoms  but 
independent  of  the  distance  cutoff  The  list  can  be  updated  very  efficiently  as  the  coordinates  of  atoms  change  during  the 
simulation.  Unlike  exphcit  nblists,  a  single  octree  works  for  all  distance  cutoffs.  In  addition,  octree  is  a  cache-friendly  data 
structure,  and  hence,  it  is  less  prone  to  cache  miss  slowdowns  on  modern  memory  hierarchies  than  nblists.  Octrees  use  almost  2 
orders  of  magnitude  less  memory,  which  is  crucial  for  simulation  of  large  systems,  and  while  they  are  comparable  in  performance 
to  nblists  when  the  distance  cutoff  is  small,  they  outperform  nblists  for  larger  systems  and  large  cutoffs.  Our  tests  show  that 
octree  implementation  is  approximately  1.5  times  faster  in  practical  use  case  scenarios  as  compared  to  nblists. 
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The  most  memory  and  time-consuming  step  in  molecular 
mechanics  and  molecular  dynamics  simulations  is  the 
calculation  of  the  nonbonded  terms  in  the  energy  function,^ 
which  requires  summations  of  pairwise  van  der  Waals  and 
electrostatics  interactions.  The  computation  times  for  these 
summations  are  proportional  to  the  number  of  interacting  pairs 
of  atoms,  which  grows  quadratically  with  the  total  number  of 
atoms  in  the  molecular  system.  However,  since  the  interactions 
decay  with  distance,  distance-based  truncation  (cutoff) 
approximations  are  widely  used  in  practice  to  trade-off  accuracy 
for  speed,  reducing  the  overall  computation  time  to  a  linear  (or 
nearly  hnear)  function  of  the  number  of  atoms.^  These  cutoff 
approximations  have  traditionally  been  handled  through 
initially  determined  and  periodically  updated  list  of  interacting 
atomic  pairs — the  so-called  nonbonded  neighborhood  lists^  or 
nblists.  The  efficiency  of  molecular  simulations  is  thus  critically 
dependent  on  the  space-  and  time-efficient  maintenance  of 
nblists.  In  the  past,  this  efficiency  had  been  achieved  by  dividing 
the  system  into  smaller  regions  based  on  cutoff  distance"^  or 
chemical  connectivity^  or  the  “By-Clusters-in-Cubes”  (BYCC) 
method.^  The  BYCC  method  combines  the  connectivity  and 
spatial  separation  criteria  by  clustering  atoms  of  the  molecular 
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system.  This  clustering,  however,  is  done  only  once  at  the 
beginning  of  the  molecular  simulation.  The  resulting  clusters 
are  then  placed  inside  a  cubic  grid  with  the  dimension  of  each 
grid  ceU  being  roughly  equal  to  the  nblist’s  cutoff  distance  plus 
the  maximum  cluster  size.  This  cubic  division  facilitates  search 
through  the  occupied  neighboring  cubes  to  determine  the  list  of 
interacting  pairs  of  atoms.  The  BYCC  method  through  its 
reliance  on  cluster— cluster  pairwise  distances  (rather  than 
pairwise  atomic  distances)  is  much  faster  than  its  predeces¬ 
sors."*’*  The  efficiency  of  the  clustering  step  of  BYCC,  based  on 
connectivity  is  obviously  higher  for  larger  molecules.  Since  here 
we  are  interested  in  simulation  of  the  macromolecular  systems 
(e.g.,  several  protein  chains),  we  use  BYCC  as  a  comparison 
benchmark.  Explicit  nblists  can  also  be  constructed  using 
dynamic  packing  grids,*  as  well  as  binary  hierarchy  approaches.^ 
However,  explicit  nblists  have  high  space  requirement,  poor 
cache  performance,  high  update  cost,  and  need  to  be 
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Figure  1.  A  (3,l)-admissible  quadtree  in  which  no  leaf  contains  more  than  1X3  =  3  points  (i.e.,  centers)  and  each  internal  node  contains  more  than 
3/1  =  3  points.  The  quadtree  is  constructed  by  recursive  subdivision  of  the  box  bounding  the  initial  set  of  points  into  quadrants. 


regenerated  explicitly  for  different  distance  cutoffs,  problems 
that  we  now  solve  effectively  in  our  new  approach. 

In  this  paper,  we  develop  an  octree^-based  technique  for 
space-efficient  implicit  maintenance  of  nblists  and  efficient 
evaluation  of  pairwise  nonbonded  energy  terms  under 
conformational  changes  of  the  molecules.  In  the  following 
text,  we  will  use  the  term  nblists  to  indicate  traditional  nbhsts 
method,  as  opposed  to  octrees.  While  nblists  use  space 
superlinear  in  the  distance  cutoff  d,  space  complexity  of  an 
octree  is  independent  of  d,  and  so,  unlike  nblists,  the  same 
octree  can  be  used  for  all  distance  cutoffs.  An  octree  is  a  cache- 
friendly  data  structure,  and  octree-based  energy  evaluations 
incur  far  fewer  cache  misses  compared  to  nblists-based 
evaluations.  Thus,  octree-based  computations  are  less  prone 
to  cache  miss  slowdowns  on  modern  machines  than 
computations  involving  nblists. 

1.  OCTREES  FOR  MAINTAINING  MOLECULES  UNDER 
CONFORMATIONAL  CHANGES 

An  octree®  is  a  tree  data  structure  that  recursively  and 
adaptively  subdivides  the  3D  space  into  octants,  and  is  often 
used  as  a  container  for  Cartesian  space  data.  Here,  we  use 
octrees  to  store  all  atomic  centers  (or  3D  atomic  Cartesian 
coordinates)  of  a  molecular  system,  organized  recursively  based 
on  their  spatial  locality.  Octrees  have  been  used  in  the  past  for 
fast  multipole-like  approximations  of  force  fields  in  particle 
simulations^  and  in  molecular  solvation  energy  and  force 
calculations.^** 

In  this  work,  we  show  that  octrees  can  be  used  for  space- 
efficient  maintenance  of  nonbonded  neighborhood  lists  (i.e., 
nblists)  of  a  molecular  system  and  can  be  updated  very 
efficiently  as  the  configuration  of  the  system  changes.  A  special 
type  of  octree,  as  defined  below,  will  be  used  for  the  purpose. 
Octrees  are  more  space-efficient,  update-efficient,  and  cache- 
efficient  in  implicit  maintenance  of  nblists  compared  to 
explicitly  maintaining  them.  Also,  unlike  explicit  nblists,  a 
single  octree  can  be  used  for  any  distance  cutoff  required  for 
evaluating  pairwise  interactions  among  atoms. 

An  octree  7~  is  called  (TC,  a) -admissible  provided  no  leaf  of 
7”  contains  more  than  ctTC  points  and  each  internal  node  has 
more  than  7C/a  points,  where  7C  >  0  is  an  integer  and  a  >  1. 


Figure  1  shows  an  example  of  a  (3,1) -admissible  quadtree,** 
which  is  a  2D  variant  of  octrees.  The  quadtree  is  constructed  by 
recursive  subdivision  of  the  box  bounding  the  initial  set  of 
points  (i.e.,  circle  centers)  into  nonempty  quadrants.  Observe 
that  in  the  resulting  tree  each  internal  node  contains  more  than 
3/1  =  3  points,  but  no  leaf  contains  more  than  3X1=3. 

1.1.  Construction  of  Octrees.  Given  a  set  P  of  points  (or 
atom  centers)  in  three  dimensions,  representing  a  simulated 
molecular  system,  consisting  of  arbitrary  number  of  chemically 
connected  components,  a  (TC,  a)-admissible  octree  can  be 
constructed  by  first  finding  a  box  (or  cube)  bounding  P  and 
then  recursively  subdividing  the  initial  cube  into  smaller 
nonempty  subcubes  until  each  subcube  encloses  at  most  cfK 
points.  The  straightforward  recursive  algorithm  is  shown  in 
Supporting  Information  Figure  1.  It  can  be  shown  that  such  a 
tree  can  be  constructed  in  O  {n  log  n)  time,  where  n  =  IPI  (see 
Supporting  Information). 

1 .2.  Space  Usage.  In  this  section,  we  show  that  a  (7C,  a)- 
admissible  octree  can  be  modified  easily  to  use  space  linear  in 
the  number  of  atoms  in  the  molecular  system  it  stores.  The 
trick  is  to  use  a  contracted  octree,  which  is  obtained  from  a 
standard  octree  7”  by  directly  connecting  each  node  of  7”  to  its 
nearest  proper  ancestor  with  more  than  one  child. 

Thus,  each  internal  node  of  a  contracted  octree  has  at  least 
two  children.  All  algorithms  we  have  described  so  far  can  be 
easily  modified  to  work  on  contracted  octrees  without  any 
asymptotic  increase  in  running  times.  It  can  be  shown  that  a 
contracted  (7C,  a) -admissible  octree  7”  storing  a  molecular 
system  containing  n  atoms  uses  0(n)  space,  where  7C  is  a 
positive  integer  and  a  >  1  (as  shown  in  the  Supporting 
Information). 

1 .3.  Computing  Interactions.  Given  two  octrees  and 
7”  g  storing  molecules  A  and  B  (possibly  A  =  B  and  thus  T  a- 
'T b),  respectively,  the  pairwise  interactions  between  atoms  of  A 
and  B  can  be  computed  as  follows.  Suppose  we  are  only 
interested  in  computing  interactions  between  a  pair  of  atoms 
provided  their  centers  lie  within  a  given  distance  cutoff  d.  We 
perform  a  simultaneous  recursive  traversal  of  7”^  and  7~ g 
starting  from  their  root  nodes.  Suppose  at  some  point  we  are 
at  node  u  of  7~a  and  node  v  of  T g.  If  the  two  cubes 
corresponding  to  u  and  v  are  separated  by  a  distance  (surface- 
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Figure  2.  Updating  the  (3,1) -admissible  quadtree  from  1:  (a)  The  point  to  be  updated  is  marked  with  solid  black  outUne.  (b)  The  point  is  moved  to 
a  new  location,  and  so  it  moves  out  of  leaf  u  and  internal  node  x  but  remains  inside  node  w  and  all  its  ancestors,  (c)  Node  u  becomes  empty,  and  so, 
it  is  removed.  Node  x  now  contains  only  3  points,  and  so,  it  becomes  a  leaf  The  updated  point  moves  to  the  top-right  quadrant  of  node  w,  which  was 
empty  before,  and  so,  a  new  leaf  v  containing  only  the  updated  point  is  created. 


to-surface)  larger  than  d,  then  there  can  be  no  interaction. 
Otherwise,  if  both  nodes  are  leaves,  we  consider  each  pair  (p  G 
u,  q  E  v),  and  compute  the  interaction  between  p  and  q 
provided  the  two  atom  centers  lie  within  distance  d.  If  neither 
of  the  two  conditions  above  hold,  we  move  to  the  children  of  u 
and/or  v  and  compute  the  interactions  recursively.  Supporting 
Information  Figure  2  shows  the  pseudocode  of  the  algorithm.  It 
can  be  shown  that  all  pairwise  interactions  between  atoms 
within  distance  cutoff  d  can  be  computed  in  O  {nS'  (5d  + 
time,  where  5  is  the  time  to  compute  one  pairwise 
interaction  (see  Supporting  Information).  For  any  fixed  value  of 
d  the  running  time  of  interaction  calculation  increases  with  the 
increase  of  7C.  This  happens  because  the  larger  the  value  of  “K , 
the  larger  the  region  of  unnecessary  exploration.  However,  as 
computing  interactions  is  a  recursive  function,  the  smaller  the 
value  of  7C,  the  deeper  the  recursion  and  the  larger  the 
recursion  overhead.  Hence,  one  must  chose  a  value  of  7C  that 
balances  between  these  two  types  of  overheads. 

1 .4.  Updates.  The  key  novelty  of  our  work  is  that  we  allow 
fast  updates  of  the  octree  under  conformational  changes  of  the 
molecular  system,  during  the  course  of  simulation.  Figure  2 
illustrates  the  basic  idea  behind  such  updates  using  the  (3,1)- 
admissible  quadtree  from  1.  Figure  2a  shows  the  original  state 
of  the  quadtree  with  the  point  (i.e.,  circle  center)  to  be  updated 
marked  with  solid  black  outline.  The  updated  position  of  that 
point  is  shown  in  Figure  2b  but  without  updating  the  structure 
of  the  quadtree.  Observe  that  the  point  has  moved  out  of  leaf  u 
and  internal  node  x  (i.e.,  parent  of  «)  but  remains  inside  node  w 
(i.e.,  parent  of  ^c)  and  all  its  ancestors.  Hence,  in  order  to  keep 
the  structure  of  the  quadtree  correct,  the  point  must  be 
removed  from  nodes  u  and  x.  Figure  2c  shows  the  quadtree 
after  the  removals.  Observe  that  since  leaf  u  is  now  empty  it  no 
longer  appears  in  the  quadtree  and  since  the  number  of  points 
inside  x  is  less  than  4  it  has  now  become  a  leaf  Also  observe 
that  the  updated  point  has  now  moved  to  the  top-right 
quadrant  of  node  w,  which  was  originally  empty.  So,  the  new 
children  (leaf  v)  of  w  are  created  corresponding  to  that 


quadrant  containing  the  updated  point.  Figure  2c  shows  the 
structure  of  the  quadtree  after  the  update.  Detailed  description 
of  the  method  together  with  pseudocode  is  given  in  the 
Supporting  Information. 

We  prove  in  the  Supporting  Information  that  a  (7C,  2)- 
admissible  octree  storing  a  molecular  system  of  n  atoms 
supports  updates  of  atomic  positions  in  O  (log  n)  amortized 
time  each,  where  7C  is  a  positive  integer.  However,  observe  that 
in  molecular  dynamics  simulations  atoms  often  move  very  short 
distances  at  every  step,  and  the  number  of  nodes  on  the 
shortest  path  between  the  leaves  corresponding  to  the  old  and 
the  new  centers  of  the  atom  is  very  small.  The  update  time  will 
often  be  proportional  to  the  length  of  this  path,  and  thus  much 
better  than  O  (log  n). 

2.  RESULTS  AND  DISCUSSIONS 

We  outline  the  experimental  setup  in  section  2.1.  The  BYCC 
method  for  explicit  nblists  was  originally  implemented  in  the 
CHARMM  program.^^  We  reimplemented  the  BYCC  method, 
and  incorporated  it  into  our  molecular  simulation  library  after  a 
rigorous  testing  and  validation  phase  that  verified  comphance 
with  the  CHARMM  implementation.  In  order  to  avoid  the 
costly  update  of  nbhsts  after  each  change  of  atomic  position,  we 
update  them  only  when  at  least  one  atom  has  moved  by  more 
than  0.5  A  since  the  last  update.  The  space  requirements  of 
expheit  nbhsts  and  octrees  are  compared  in  section  2.1.  Section 
2.3  reports  the  running  times  and  cache  performance  of  a 
limited-memory  BFGS  minimizer^^  using  the  two  data 
structures  for  maintaining  molecular  systems  under  conforma¬ 
tional  changes. 

2.1.  Experimental  Setup.  All  algorithms  were  imple¬ 
mented  in  C,  and  tested  on  Intel  Xeon  5680,  using  the  icc 
compiler  with  optimization  flag  “-03”. 

We  ran  our  experiments  on  the  following  six  proteins  of 
varying  sizes  downloaded  from  the  Protein  Data  Bank,^'*^'^*  and 
prepared  using  CHARMM  c34b2:^^'^^  ICLV  (4680  atoms). 
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Figure  3.  Plot  a  compares  the  sizes  of  of  nblists  and  octrees  constructed  for  six  protonated  proteins  containing  4600—71  000  atoms,  nblists  are  built 
for  12,  15,  and  18  A  distance  cutoff.  Plot  b  zooms  in  on  the  octree  curve  in  plot  a. 


2HMI  (13  758  atoms),  ISUV  (25  118  atoms),  2XRP  (33  602 
atoms),  3LUE  (46  200  atoms),  and  2CGT  (70  847  atoms). 

We  have  found  that  (60,  2)  octrees  work  best  for  molecular 
simulation  application. 

In  our  experiments,  we  compare  the  cost  of  maintaining 
molecules  under  conformatonal  changes  using  nblists  and 
octrees.  For  this  purpose,  we  use  these  data  structures  to  store 
molecules  that  undergo  changes  in  atomic  positions  during 
energy  optimization  using  the  limited-memory  BFGS  (LBFGS) 
algorithm.^^  LBFGS  is  a  quasi-Newton  optimization  method 
that  uses  a  limited  memory  variation  of  the  Broyden— Fletcher— 
Goldfarb— Shanno  (BFGS)  algorithm^^  for  solving  nonlinear 
optimization  problems.  We  have  used  our  own  C  port  of  Jorge 
Nocedal’s  original  Fortran  implementation  of  the  LBFGS 
algorithm. 

2.2.  Space  Usage.  For  each  protein  included  in  our 
experiments,  we  constructed  nblists  for  each  integral  distance 
cutoff  ranging  from  9  to  20  A. 

Figure  3  compares  the  sizes  of  these  data  structures.  In 
Figure  3a,  we  plot  the  space  used  by  the  octrees  as  well  as  by 
the  nblists  for  12,  15,  and  18  A  distance  cutoff.  Figure  3b  zooms 
in  on  the  octree  curve  in  Figure  3a.  As  evident  from  the  plots 
both  octrees  and  nblists  for  any  given  distance  cutoff  use  space 
linear  in  the  number  of  atoms  in  the  molecular  system. 
However,  size  of  an  nblist  grows  as  the  distance  cutoff 
increases.  Also,  the  size  of  an  octree  is  much  smaller  than  that 
of  an  nblist  for  the  same  system.  For  the  six  proteins  in  the 
plots,  octree  size  varied  from  55  KB  (for  ICLV)  to  1  MB  (for 
2CGT),  while  nbhst  size  varied  from  3.5  MB  (for  IGLV)  to 
around  50  MB  (for  2CGT)  for  12  A  distance  cutoff,  and  from  9 
MB  to  around  130  MB  for  18  A  cutoff  Overall,  nbhsts  used 
from  45  to  65  times  more  space  than  octrees  for  12  A  cutoff, 
and  the  range  increased  to  120—165  for  18  A. 

It  turns  out  that  for  the  proteins  and  the  d  values  included  in 
our  experiments  3.7nd^'^  is  a  lower  bound  on  the  sizes  of  nblists 
while  1.7nd^'^  is  an  upper  bound.  In  comparison,  octrees  use 
only  ©  (n)  space.  As  can  be  seen,  octrees  are  orders  of 
magnitude  more  memory  efficient  than  nblists. 

2.3.  Running  Times  in  Application  to  Local  Mini¬ 
mization.  Figure  4  compares  the  time  required  to  complete 
100  steps  of  the  LBFGS  minimizer  using  octrees  and  nblists  as 
the  distance  cutoff  for  computing  pairwise  interactions  grows. 


For  smaller  distance  cutoffs  the  overhead  of  recursion  in 
calculation  of  interactions  slows  down  octree-based  computa¬ 
tion.  However,  that  overhead  gradually  diminishes  compared  to 
the  cost  of  actual  computation  as  the  cutoff  value  increases. 
Also,  nblists  constructed  for  large  cutoff  values  are  often  too 
large  to  fit  in  the  cache,  and  as  a  result,  computations  slow 
down  due  to  costly  cache  misses.  Octrees,  on  the  other  hand, 
incur  very  few  cache  misses  as  they  are  often  small  enough  to  fit 
into  the  cache.  Detailed  analysis  of  cache  performance  is 
provided  in  the  Supporting  Information. 

We  have  compared  performance  of  octrees  vs  nbhsts  in 
different  synthetic  scenarios.  We  have  used  several  nonbonded 
forces  with  different  cutoffs. 

Figure  4a  plots  the  ratio  of  the  running  time  of  the  nbhsts- 
based  LBFGS  minimizer  to  that  of  the  octree-based  minimizer 
when  minimizing  van  der  Waals  (vdW)  energy  only.  Octrees 
were  up  to  20%  slower  than  nblists  for  distance  cutoffs  9  A—  1 1 
A,  comparable  to  nblists  for  cutoff  values  12  and  13  A  and  ran 
faster  than  nblists  for  all  systems  when  the  cutoff  value  reached 
14  A.  Supporting  Information  Figure  6  reveals  one  of  the  major 
reasons  behind  the  improving  performance  of  octrees.  As  the 
distance  cutoff  increased,  the  sizes  of  nblists  also  increased, 
leading  to  increasing  number  of  misses  at  various  levels  of  the 
caches.  On  the  other  hand,  much  smaller  sizes  of  octrees  and 
their  cache-friendly  recursive  traversal  for  energy  evaluation 
meant  that  the  number  of  cache  misses  incurred  by  the  octree- 
based  minimizer  grew  at  a  much  slower  rate. 

Figure  4b  is  similar  to  Figure  4a  but  uses  hydrogen  bonding 
(hbond)  energy  in  addition  to  vdW  energy.  The  nblists-based 
minimizer  evaluated  the  hbond  energy  using  the  same  nblist 
constructed  for  evaluating  vdW  energy.  However,  since  the 
same  octree  works  for  aU  distance  cutoffs,  the  octree-based 
minimizer  always  evaluated  the  hbond  energy  using  a  3  A 
distance  cutoff  irrespective  of  what  cutoff  value  was  used  for 
vdW  energy.  Because  of  the  lower  overhead  in  evaluating 
hbond  energy,  octrees  achieved  even  better  speedup  with 
respect  to  nblists  than  in  Figure  4a.  For  example,  for  16  A 
cutoff,  octrees  were  10—30%  faster  than  nbhsts  in  Figure  4a,  but 
in  Figure  4b,  the  speedup  values  improved  to  20—40%. 

Figure  4c  is  for  minimizing  the  sum  of  vdW,  hbond,  and 
Coulomb  electrostatic  (Coul)  energy.  While  the  nblists-based 
minimizer  used  the  same  distance  cutoff  for  evaluating  all  three 
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Figure  4.  Time  required  for  100  steps  of  LBFGS  minimization  involving  nblists  and  octrees:  (a)  minimize  in  (and  hence  compute)  vdW  energy  for 
various  distance  cutoffs,  where  hoth  data  structures  use  the  same  cutoff  value;  (h)  minimize  in  vdW  plus  hhond  energy,  where  nhlists  use  fhe  same 
cutoff  value  (given  in  the  horizontal  axis)  for  hoth  types  of  energy  while  octrees  use  a  fixed  3  A  cutoff  for  hhond  and  the  same  cutoff  as  nhlists  for 
vdW  (given  in  the  horizontal  axis);  (c)  minimize  in  vdW  plus  hhond  plus  Coul,  where  nhhsts  use  the  same  cutoff  value  (given  in  the  horizontal  axis) 
for  ah  types  of  energy  while  octrees  use  a  fixed  3  A  cutoff  for  hhond,  a  fixed  9  A  cutoff  for  vdW,  and  the  same  cutoff  as  nhhsts  for  Coul  (given  in  the 
horizontal  axis). 


types  of  energy,  the  octree-based  minimizer  always  used  a  3  A 
cutoff  for  hbond,  and  9  A  for  vdW.  Octrees  always 
outperformed  nblists  and  ran  80—100%  faster  (i.e.,  1.8— 2.0 
times  faster)  for  16  A  cutoff  (see  Figure  4c). 

Figure  5  shows  a  breakdown  of  the  total  time  spent  in 
minimizing  2CGT  into  the  time  required  for  updating  the  data 
structure  and  the  time  needed  for  computing  the  energy. 
Implementations  based  on  nblists  and  octrees  have  been 
compared.  Observe  that  octree  update  times  are  almost 
insensitive  to  distance  cutoffs  while  time  required  for  updating 
nblists  increases  with  the  cutoff  value.  Even  for  a  9  A  cutoff, 
octrees  can  be  updated  more  than  20  times  faster  than  nblists, 
and  for  20  A,  updating  octrees  can  be  more  than  100  times 
faster.  Though  octree-based  energy  computation  is  slower  than 
nblists  for  smaller  distance  cutoffs,  this  relative  speed  improves 
and  octrees  start  to  beat  nblists  for  large  cutoffs. 


■  CONCLUSION 

We  have  developed  memory  and  cache  efficient  octree 
implementation  of  implicit  nblists,  which  has  almost  2  orders 
of  magnitude  lower  memory  usage  as  compared  to  existing 
implementations.  In  addition  to  tremendous  reduction  in 
memory  requirements  the  code  runs  approximately  1.5  faster 
than  nblists  in  practical  use  case  scenarios.  The  algorithm 
described  in  the  paper  is  suitable  for  effective  parallelization  and 
vectorization,  as  described  and  demonstrated  in  the  Supporting 
Information  section  2.2,  and  thus  can  be  applied  to  Molecular 
Dynamics  (MD).  Current  implementation  is  directly  useful  for 
simulations  requiring  multiple  simultaneous  minimizations, 
such  as  multistart  Monte  Carlo  Minimization  (MCM).  The 
code  is  available  under  open  source  license. 
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Figure  5.  Plots  show  a  breakdown  of  the  total  time  required  for  running  100  steps  of  LBFGS  minimization  on  2CGT  into  update  time  and  energy 
computation  time.  The  nblists  and  octree  implementations  of  the  algorithm  are  compared.  Please  see  4  for  the  cutoff  values  used  in  parts  a,  b,  and  c. 
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